Templeton's Features List
The features available in Templeton have been divided into categories
for ease in browsing:
- Mirroring
- Restrictions
- Log files
- Network
- Advanced features
Several options are available when mirroring with Templeton:
- Copying. Templeton retrieves HTML documents, inline images, and
linked files to the local computer system. All links traversed are retrieved,
regardless of file format. Templeton even retrieves some clickable image
maps.
- Link rewriting. HTML documents that are copied have their links
rewritten automatically so that they may be used by local browsers without
requiring internet access. Furthermore, the links are written using
relative file names. This allows for easy file relocation (just move the
entire subtree) and for use without a local WWW server.
- Saving. Templeton stores files in a long file format or DOS FAT
8.3 file format. For DOS based computer systems (including Microsoft
TM Windows TM and
OS/2 TM using a FAT file system) the retrieved files
are stored in a truncated 8.3 format. Under operating systems that support
long files names, such as OS/2 using HPFS and Unix, Templeton will store files
with long, descriptive names. You may also
specify using the FAT file format for exporting to DOS based machines.
- File Overwriting. Templeton may be configured to either
overwrite existing files from a previous mirror or to not process files
that exist from a previous run.
- Simple HTML corrections. One of the most type of common errors in
HTML documents
is the (unintentional) omission of quotation marks. Most HTML browsers
forgive this typographical error; Templeton corrects it.
- Link removal. When a hyperlink is not traversed, Templeton can be
configured to either remove the link or leave the untravered link.
- Mapping only. Sometimes it is not desirable to create a mirror
image of a web site. Templeton can be configured to map remote sites, and to
not retrieve files.
- Server Identification. For some tasks, it is helpful if the type of
WWW server is known. Templeton generates a list of server names
and type of WWW server.
- E-mail lists. Due to popular request, Templeton can generate a
list of all e-mail addresses that it finds. This is useful for
automated mailing lists and contact information.
To prevent unwanted wandering of Templeton across the entire World Wide Web,
the search may be restricted. Templeton supports the following types of
restrictions.
- Host restriction. Templeton may be explicitly told not to traverse
other WWW servers. The restricted server may be listed as any of the
following:
- Current host. Templeton will not traverse links that leave the
initial WWW server. This is the most common type of restriction.
- Subnet. Links within a subnet may be traversed, but WWW servers
outside of the subnet are not visited. This is especially useful when
your school or company maintains a number of WWW servers but you do not
wish to mirror the entire World Wide Web.
- Domain Name. In many cases, a company or school may exist on multiple
subnets, but maintain the same domain name. By restricting to a domain name,
these servers may be mirrored or mapped without traversing the entire World
Wide Web. An example of a restricted domain name is ".intel.com" which
allows only machine names that are in the Intel subnet. This would allow
"www.intel.com" and "gopher.intel.com" but not "www.intel.chips.com" nor the
machine "intel.com".
(These are just examples, not necessarily real machines names.)
- Path Restriction. When restricting to a single WWW server, you
may also wish to restrict to a specific subdirectory on that server. For
example, if you are interested only in the faculty at the Texas A&M
computer science department, then you may wish to restrict to
http://www.cs.tamu.edu/faculty/. HTML documents not within the faculty
subdirectory would not be retrieved.
- Depth Restriction. Templeton processes links in a breadth-first
search pattern. In a breadth-first search, all links from a document are
traversed, then all links from the traversed documents are followed. By
restricting the depth of the search, you limit the number of links to be
followed.
You should be cautious since a breadth-first search may exponentially increase
the number of links to follow at each depth.*
- Robot Exclusion. Applications that search the World Wide Web,
such as Templeton are refered to as Web Robots. Many WWW servers do not
allow web robots to traverse the available information. Why? Some robots
are not nice and generate so many requests in a short amount of time that the
WWW server slows to a crawl or breaks down. Other web robots try to index (or
mirror or map) proprietary, copyright, or temporary information. Finally
(and most common) some robots become stuck traversing infinite virtual
databases such as Yahoo.com, Tiger Census Maps, or Mud Games.
Templeton supports robot exclusion and can be configured to avoid
restricted paths on a server.
Templeton provides a number of log files while it operates:
- Remote Mapping. This log file contains a list of each web page
that was accessed, the links found on each page, and other useful information
such as robot exclusions and unreachable links/hosts. Each web page contains
information about its reference point and how many links you would
need to follow to access this page.
- Local Mapping. Similar to remote mapping, the local map file
tells where each copied file was placed on the local file system.
- Server Identification. This optional log file maintains a list
of servers visited, including the DNS name and type of WWW server
that was found.
- Mailto Listing. This optional log file contains a list of e-mail
addresses that were found in the HTML documents and can be very useful for
generating mailing lists.
These features incorporate network information.
- E-mail address. A good web browser/robot informs each server
"who" is running the software. This is normally your e-mail address. Since
frequently the determined address is not the "correct" e-mail address,
Templeton allows you to modify this field.
- Proxy Support. For people who must use a proxy server to access
beyond a firewall, Templeton allows the use of a proxy server.
- Spoof Support. Some web servers refuse to pass data to
"unsupported" browsers. This is usually seen with non-Netscape viewers.
Spoofing allows Templeton to cameflage its name and appear as a different
browser.
Templeton has many features that are considered "advanced."
- Templeton can execute without user interaction. This is especially
useful for automated retrieval or backups or web documents.
- Templeton has the ability to execute other applications on the retrieved
documents.
[Main Menu]
*
Neal's Web Conjecture: Yahoo is reachable from within 8 links of any web page that has links to other machines.
Neal's Other Web Conjecture: You don't want to mirror or map Yahoo.
Document revision: 20 Oct. 1996 for Templeton 1.77 beta
Copyright 1996 N.A. Krawetz
Modification, republication, and redistribution of this
document is strictly prohibited. All rights reserved.